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Abstract 

Q This article presents a study that compares detected structural communities 

in a coauthorship network to the socioacademic characteristics of the scholars 
that compose the network. The coauthorship network was created from the 
bibliographic record of a multi-institution, interdisciplinary research group 
focused on the study of sensor networks and wireless communication. Four 
l ^ different community detection algorithms were employed to assign a struc- 

tural community to each scholar in the network: leading eigenvector, walk- 
trap, edge betweenness and spinglass. Socioacademic characteristics were 
gathered from the scholars and include such information as their academic 
department, academic affiliation, country of origin, and academic position. 
OO A Pearson's x 2 test, with a simulated Monte Carlo, revealed that structural 

O communities best represent groupings of individuals working in the same 

academic department and at the same institution. A generalization of this 
result suggests that, even in interdisciplinary, multi-institutional research 
jjj groups, coauthorship is primarily driven by departmental and institutional 

affiliation. 
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Introduction 

Scholarly and research environments have progressively become more diversified and 
interdisciplinary in nature. In interdisciplinary research groups, scholars with different re- 
search interests and backgrounds collaborate on different topics and reconcile their research 
work into joint endeavors (Qin, Lancaster, & Allen, 1997). This type of interaction is gen- 
erally seen as being driven by the need for a cross-fertilization of ideas when collaborating 
on topics that bridge across disciplines (Niles, 1975; Beaver & Rosen, 1978). The mutual, 
direct engagement among previously uncorrelated research topics has advantages not only 
for the researchers, that are able to draw from a wider, diverse intellectual environment, 
but also for the nature of research performed, that is circulated, validated and enriched by 
contact with new research and social circles (Pierce, 1999). 

Large, interdisciplinary, multi-institutional research centers are interesting environ- 
ments to study collaboration. The ensemble of social, academic and demographic character- 
istics found in these centers have the potential to considerably affect collaboration patterns. 
Research interests (Havemann, Heinz, & Kretschmer, 2006) and academic domain (Moody, 
2004) are examples of such characteristics that have been reported to shape the way by 
which individuals collaborate in a research environment. Coauthorship is a prominent indi- 
cator of collaboration in scholarship. However, a number of other indicators of collaboration 
have been identified and extensively studied and reviewed in the literature. Some of these 
indicators involve formal, recorded communication and are mostly analyzed by the use of 
bibliometric methods; besides coauthorship, these include broad categories of research in 
citation, co-citation and acknowledgment networks (see Borgman and Furner (2002) for an 
extensive review). Other indicators involve less tangible, informal forms of collaboration, 
such as electronic communication and physical proximity (Katz, 1994; Olson & Olson, 2000; 
Finholt, 2003; Pao, 1992). 

Coauthorship is the major focus of the work presented in this paper. Relevant to 
this work are a number of studies that employ network analysis to study coauthorship pat- 
terns in academic and scientific circles. Borner, Dall'Asta, Ke, and Vespignani (2005), for 
example, posit a weighted graph approach to identify the local and global properties of a 
scientific coauthorship network to document the emergence of a novel field of science (infor- 
mation visualization). Other domain-specific studies have mined bibliographic databases in 
the fields of genetic programming (Tomassini, Luthi, Giacobini, & Langdon, 2007), library 
science (Liu, Bollen, Nelson, & Van de Sompel, 2005) and neuroscience (Braun, Glanzel, 
& Schubert, July 2001) performing comparative analyses with other scientific collaboration 
networks. Lorigo and Pellacini (2007) analyze a large document set of scholarly articles 
in high energy physics and, rather than inspecting the macroscopic changes in collabo- 
ration patterns of this domain, they focus on the impact of Internet-based collaborative 
technologies on the evolution of remote collaborations. Cross-domain comparative analy- 
ses are presented by Newman (2004), who analyzes large databases of papers in the fields 
of physics, biology, and mathematics exploring social and normative domain differences of 
coauthorship behavior. All these network-based analyses have proved viable for a num- 
ber of visualization studies that employ graphical representations of coauthorship networks 
to uncover macroscopic patterns that network analysis alone might fail to reveal (Borner, 
Chen, & Boyack, 2005; Douglas, Montelione, & Gerstein, 2005). 
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In this paper, communities of coauthors in a multi-institutional interdisciplinary re- 
search center are uncovered via the use of community detection algorithms. These com- 
munities, called structural communities (or community structures), are made salient solely 
by the topological characteristics of the network being analyzed. Structural communities 
are "cliquish" subgraphs composed by groups of vertices that are highly connected between 
them, but poorly connected to other vertices (Girvan & Newman, 2002). The study of com- 
munity structure in networks is particularly important because communities might display 
local properties that differ greatly from the properties of the network as a whole. Even a 
very detailed analysis of a network at a global level might fail to uncover specific patterns 
and characteristics that only exist within tight-knit communities and sub-communities of 
the network. Community detection algorithms have been previously applied to social net- 
works to uncover the relationship between nationality and collaboration (Lozano, Duch, Sz 
Arenas, 2007), latent communities in large organizations (Tyler, Wilkinson, & Huberman, 
2003), political and organizational structures (Porter, Mucha, Newman, & Friend, 2007), 
and to identify communities in networks of collaborating musicians (Gleiser & Danon, 2003; 
Smith, 2006). 

The aim of this article is to detect structural communities of coauthorship and com- 
pare them to socioacademic communities — the social and academic groupings of their 
constituent members. This comparative analysis uncovers the specific socioacademic char- 
acteristics that are best described by community structures in a coauthorship network. 
Four different community structure configurations are found, using the following commu- 
nity detection algorithms: leading eigenvector (Newman, 2006), walktrap (Pons & Latapy, 
2006), edge betweenness (Girvan & Newman, 2002) and spinglass (Reichardt & Bornholdt, 
2006). Subsequently, the community structures detected via the four different algorithms 
are subjected to Pearson's \ 2 test f° r statistical independence (Sheskin, 2004) to determine 
whether the latent structural communities in the research group's coauthorship network are 
dependent (or independent) of the various socioacademic characteristics of the scholars. 

Socioacademic and structural communities 

Community structure and many other social network indices are largely based on the 
topology of the network, rather than specific social, demographic or academic characteristics 
of the individuals in the network (i.e. socioacademic characteristics). In the research group 
analyzed in this study, scholars from different institutions and departments collaborated 
together to produce jointly coauthored papers, books and conference proceedings. 

Community detection algorithms are useful at capturing the structural communities 
of the resulting coauthorship network. However, as coauthorship networks are essentially 
made up of scholars, the various socioacademic characteristics of these scholars can be 
regarded as parallel, socioacademic communities. Note that it is not necessary that these 
various structural and socioacademic communities overlap. For example, two scholars might 
be members of the same structural community in a coauthorship network, but have different 
academic affiliations and thus, be members of different institutional communities. However, 
when two communities do overlap in a non-random manner, much can be said about the 
semantics of the latent structural community. Before discussing the relationship between 
structural and socioacademic communities, this section will review the various types of 
communities present in the research group under study. 
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Socioacademic communities 

The research center analyzed here is the Center for Embedded Networked Sensing 
(CENS), a National Science Foundation venture involving scholars at all levels (faculty, 
scientists, engineers, graduate and undergraduate students) from five member institutions 
(University of California at Los Angeles, University of Southern California, University of 
California Riverside, California Institute of Technology, and University of California at 
Merced). The type of research conducted at CENS spans across a wide spectrum of dis- 
ciplines and applications: from biology to seismology, from wireless telecommunications to 
statistics. 

The coauthorship network employed in this study was constructed from 560 
manuscripts (379 conference papers, 163 journal articles, 17 book chapters and 1 book) 
published over a ten year period (1998-2007) by CENS scholars and available at the the 
CDL eScholarship repository (http://repositories.cdlib.org/cens). In particular, only those 
papers were considered that have at least one of the authors affiliated with CENS at the time 
of publication of the paper. Ultimately, the generated coauthorship network contained 291 
vertices (scholars) and 2536 edges (coauthoring events). For every scholar in the network, 
details about the following socioacademic characteristics were gathered: academic depart- 
ment, academic affiliation, country of origin, and academic position. Table 1 presents the 
frequency distribution of the values for the aforementioned socioacademic characteristics. 



count 


department 


count affiliation 


113 


Computer Science 


148 


UCLA 


80 


Electrical Engineering 


66 


use 


23 


Civil Engineering 


10 


MIT 


19 


Biology 


8 


Caltech 


9 


Information Science 


7 


UC Riverside 


7 


Environment 


7 


UC Berkeley 


5 


Education 


4 


UC Merced 


4 


Marine Biology 


3 


SUNY Stony Brook 




count 


origin 


count 


position 


120 


United States 


97 


PhD student 


33 


India 


67 


Other researcher 


24 


China 


44 


Professor 


10 


Italy 


21 


Postdoc 


9 


South Korea 


21 


Associate Professor 


5 


Australia 


20 


Assistant Professor 


4 


Greece 


5 


Undergraduate Student 


3 


Iran 


3 


Lecturer 



Table 1: Frequency counts (top 8) for the socioacademic characteristics of the scholars under study: 
academic department, academic affiliation, country of origin, and academic position. 

Structural communities 

Structural communities relative to the coauthorship network studied here were iden- 
tified using four different community detection algorithms: (a) leading eigenvector, a recent 
and popular technique, based on the definition of the modularity function in terms of the 
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eigenspectrum of matrices (Newman, 2006), (b) walktrap, a technique based on random 
walks (Pons & Latapy, 2006), (c) edge betweenness, the earliest community detection tech- 
nique, based on vertex betweenness centrality (Girvan & Newman, 2002) (d) spinglass, a 
technique based on a spin-glass model and simulated annealing (Reichardt & Bornholdt, 
2006). 

The structural communities found in the coauthorship network via the leading eigen- 
vector algorithm are diagrammed in Figure 1 according to the Fruchterman-Reingold net- 
work layout algorithm (Fruchterman & Reingold, 1991). The leading eigenvector community 
detection algorithm deconstructs the coauthorship network into 27 structural communities 
and it assigns a membership value to each node. Note that the membership value is a nom- 
inal value identifying distinction, not relative similarity between identified communities. 
Thus, scholars that are in the same structural community are given the same membership 
value. With respect to Figure 1, the membership value is identified by a unique color (or 
shade) . The diameter of the vertices in the figure represents the weighted eigenvector cen- 
trality score of the vertices, where more central vertices have larger diameters (Bonacich, 
1987). Finally, note that the community highlighted in the lower right portion of the figure 
is further analyzed in the next section. 



° o 




Figure 1. Detected structural communities in the coauthorship network under study detected 
according to the leading eigenvector algorithm. Each community is represented using a different 
color (or shade) . Vertex diameter represents the eigenvector centrality score of the vertex, where 
more central vertices have larger diameters. The community highlighted in the lower right corner is 
further analyzed in Figure 3. 

Some fundamental statistics relative to the coauthorship network studied here are 
presented in Table 2. The coauthorship network is found to be highly connected, with 
five connected components (distinctly separated subnetworks) with the vast majority of 
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the vertices being within the largest component (280 out of a total of 291 vertices). This 
means that besides the four small separate components, the interdisciplinary research group 
studied here is perceived, as a whole, as a single coauthoring community. Figure 2 presents 
the number of scholars identified in the 27 structural communities identified by the leading 
eigenvector community detection algorithm. 



variable 


value 


vertices 


291 


edges 


2536 


cliques 


14 


connected components 


5 (280, 4, 3, 2, 2) 


connectedness 


0.926 


7 coefficient 


1.460 


clustering coefficient 


0.330 



Table 2: A summary of some fundamental statistics for the coauthorship network under study. 
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Figure 2. The number of scholars in the 27 identified structural communities according to the 
leading eigenvector community detection algorithm. 



In this network, like most natural networks, there are few highly connected vertices 
and many lowly connected vertices. This is further made salient by the eigenvector centrality 
scores represented by the diameter of the vertices in Figure 1. The clustering coefficient 
value found for the coauthorship network presented here (0.33) is similar to that found in 
the coauthorship networks in the field of mathematics (0.34), but much lower than what 
is generally found in the physical sciences and biology (Newman, 2003). This suggests 
less-cliquish, sparse collaboration patterns among scholars and is perhaps indicative of an 
interdisciplinary community that is more fragmented in their research agenda. 
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Results 



In order to test for independence between the detected structural communities and 
the explicit socioacademic communities of the coauthorship network under study, a Pear- 
son's x 2 (i- e - chi-squared) analysis was conducted. For each one of the four socioacademic 
characteristics (academic department, academic affiliation, country of origin and academic 
position), four contingency tables were created, one for each community detection algorithm 
employed (leading eigenvector, walktrap, edge betweenness, spinglass 3 ). In each of these 
tables, the x-axis elements represent a distinct community membership value as identified 
by the specific community detection algorithm used and the y-axis represents each distinct 
value for a particular socioacademic characteristic. Cell values in each contingency table 
identify the number of observed occurrences of an x/y relationship. 

The contingency tables were then subjected to a Pearson's \ 2 test with a simulated 
Monte Carlo to determine whether each socioacademic property was dependent or inde- 
pendent of the assigned community membership value. The results of the x 2 tests for 
each socioacademic characteristic are presented in Table 3, where the p- value denotes the 
probability that an association is random (i.e. a p-value greater than 0.05 is generally con- 
sidered statistically independent). The p- values obtained via the four community detection 
algorithms all demonstrate that both scholar's department and affiliation are dependent on 
the identified structural community of the scholar. On the other side of the spectrum, the 
academic position and country of origin of a scholar are independent of the identified struc- 
tural community of the scholar. This latter result was discontinued only by the spinglass 
community detection algorithm that found academic position to be statistically significant. 

It is important to note that the dataset under study lacks socioacademic values for 
some of the individuals in the coauthorship network; the number of null values for each 
socioacademic characteristic are also displayed in Table 3. The absence of these values, 
coupled with the size and configuration of the network, resulted in some of the contingency 
tables being sparsely populated (i.e. a high proportion of the cell expected values being 
equal to or 1). Table sparseness is a cause of concern for the validity of the \ 2 test 
for independence. In order to remedy this situation, a Monte Carlo exact test was con- 
ducted. This method has been recognized as a solution to table sparseness for categorical 
multivariate data analysis (Agresti, 1992; Reiser, Mark & Lin, Yiching, 1999). 



Table 3: p-values for the x 2 tests of the contingency tables (academic department, academic affilia- 
tion, country of origin and academic position) obtained via community detection algorithms: leading 
eigenvector (LEV), walktrap (WT), edge betweenness (EB), spinglass (SG). 



3 The spinglass community detection algorithm requires a connected network, therefore the analysis was 
performed only on the the largest connected component which contained 280 of the total 291 nodes. Thus, 
11 nodes were left out of this analysis. 



socioacademic characteristic null values 



LEV WT BC SG 



Academic Department 5 

Academic Affiliation 

Country of Origin 52 

Academic Position 13 



0.0001 0.0005 0.0005 0.0005 

0.0264 0.0007 0.0265 0.0005 

0.2403 0.4130 0.2293 0.0934 

0.7166 0.1486 0.1672 0.0453 
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Figure 3 presents an anecdotal example of the aforementioned finding: the structural 
community highlighted in the lower right portion of Figure 1. The four socioacademic prop- 
erties of the scholars have been identified and annotated: department, affiliation, country of 
origin, and academic position. Returning to the \ 2 results in Table 3, one would expect to 
see department and affiliation to match closely to the membership community, and country 
of origin and academic position to be random. This prediction is confirmed exactly in the 
visual investigation of the specific community presented in Figure 3. 
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(a) Academic Department (b) Academic Affiliation 




(c) Country of Origin (d) Academic Position 



Figure 3. Socioacademic characteristics of a specific structural community of the coauthorship net- 
work under study. In this example, community structure best represents department and affiliation 
and poorly represents origin and academic position. 



Figure 3a indicates the academic departments of the scholars in the community: 
these are almost entirely scholars in computer science, with the exception of two electrical 
engineers. Academic affiliation is more variegated (Figure 3b) with three different scholars 
in the community belonging to three different institutions, yet all other scholars are from 
UCLA. The other two views of the same community (Figures 3c and 3d) present more 
fragmented scenarios: scholars come from five different countries and six different academic 
ranks. This visual analysis thus confirms the quantitative findings of the x 2 test. 
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Conclusion 

The analysis presented in this article compared communities in the studied coau- 
thorship network identified by the community detection algorithms to communities made 
explicit in the socioacademic profile of the constituent community members. The structural 
communities detected and the socioacademic communities were compared using a Pear- 
son's x 2 test with a simulated Monte Carlo. Results demonstrated that the community 
structures revealed by four different community detection algorithms (leading eigenvector, 
walktrap, edge betweenness and spinglass) all successfully capture specific existing socioa- 
cademic characteristics of individual scholars, namely academic department and affiliation. 
On the other hand, they fail to correctly capture demographic characteristics such as the 
individuals' country of origin. With the exception of the spinglass algorithm, all community 
detection methods indicate statistical independence for academic position. 

These results might be generalized in several ways. First, results might be used 
to make specific policy recommendations for the research group under study, the Center 
for Embedded Networked Sensing (CENS). The analysis has shown that coauthoring com- 
munity structures overlap with socioacademic communities sharing characteristics such as 
academic affiliation and department. In other words, the results reveal that most of the 
coauthorship activity appears within communities of scholars related by the same area of ex- 
pertise (department) and within the same institution (affiliation). This finding is in contrast 
with the scope and goals of inter-disciplinary multi-institutional research centers. 

Second, the results could be extended to other coauthorship networks. Similar com- 
parative analyses could be performed on coauthorship networks of other scholarly domains 
to find out how different socioacademic characteristics describe the structural communities 
of diverse fields and/or research environments. In these studies, a number of additional 
social, academic and demographic parameters could be added to the socioacademic study, 
including physical location, first language, and research specialization. 

Finally, these types of studies might prove useful to predict either the topological 
or socioacademic configuration of a network when either data is scarce. For example, 
the coauthoring community structures revealed in this study were found to be strongly 
correlated with academic affiliation and department. Thus, it might be possible to infer 
the academic affiliation and department of certain community members for which no data 
is available, simply based on the topological community structure to which they belong. 
Similarly, unknown coauthoring relationships might be inferred among individuals whom, 
though not being part of the same community structure, share a set of socioacademic 
characteristics. 
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